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Numerical algorithms for elliptic partial differential equations frequently em¬ 
ploy error estimators and adaptive mesh refinement strategies in order to 
reduce the computational cost. 

We can extend these techniques to general vectors by splitting the vectors 
into a hierarchically organized partition of subsets and using appropriate 
bases to represent the corresponding parts of the vectors. This leads to the 
concept of hierarchical vectors. 

A hierarchical vector with m subsets and bases of rank k requires mk 
units of storage, and typical operations like the evaluation of norms and 
inner products or linear updates can be carried out in 0(mk 2 ) operations. 

Using an auxiliary basis, the product of a hierarchical vector and an "H 2 - 
matrix can also be computed in 0(mk 2 ) operations, and if the result admits 
an approximation with m subsets in the original basis, this approximation 
can be obtained in 0{{m + m)k 2 ) operations. Since it is possible to compute 
the corresponding approximation error exactly, sophisticated error control 
strategies can be used to ensure the optimal compression. 

Possible applications of hierarchical vectors include the approximation of 
eigenvectors, optimal control problems, and time-dependent partial differen¬ 
tial equations with moving local irregularities. 


1 Introduction 

We consider the standard Poisson problem 

—A u(x) = f(x) for all i6(l, 

u(x) = g(x) for all x G dkl, 

where U is a Lipschitz domain with boundary dQ, f : fl — > M and g : dVt —> M are given 
and we are looking for the solution u : fi —> M. 

If /, g and the boundary of U are sufficiently smooth, classical regularity theory states 
that the solution u will also be smooth. If only / is smooth, interior regularity results 
state that the solution u will at least be smooth in the interior of U, but may have 
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singularities at the boundary. If / is smooth only part of the domain, the solution u will 
still be smooth in the interior of this part. 

Discretization schemes can take advantage of these properties to significantly reduce 
computational work and storage requirements: a standard finite element scheme can use 
fairly large elements to represent smooth parts of the solution and refine the triangulation 
locally close to the non-smooth parts mmm, and sophisticated error estimation 
techniques [D El H HU IH [18] have been developed to automatically choose parts of 
the mesh that should be refined. A particularly elegant approach can be developed for 
wavelet techniques |8] by looking for the “most important” among the (infinitely many) 
coefficients of the solution. 

All of these techniques rely on special properties of the operators and spaces involved 
in the computation, e.g., coercivity of bilinear forms or local approximation estimates 
for finite element spaces. 

Some of these requirements can be avoided by following a purely algebraic approach: 
instead of using a locally refined discretization, we rely on a uniform discretization that 
can represent all functions expected to appear in the algorithm sufficiently well. For the 
sake of simplicity, we assume that the discretization corresponds to mesh points (xj)jgx, 
where T is a finite index set, e.g., the set of nodal points of a finite element discretization. 
Each function usSl then corresponds to a vector u E given by 

Ui = u(xi ) for all i El. 

Since we are using a uniform mesh, the dimension n = ffX can be expected to be very 
large, and working with vectors u E directly would take too long and require too 
much storage. 

We can significantly improve the efficiency by using data-sparse approximations of 
vectors. The compression scheme takes its cue from 'H 2 -matrices [HI 0 [3]: if a function 
u : —> M is smooth in a subdomain oj C D, e.g., due to interior regularity properties, 
we can approximate u | w by polynomials and obtain 

k 

u(x) ~ 'y^p v {x)u v for all x E u, 

V=1 

where {pu)t=i is a polynomial basis and u E is a matching coefficient vector. For the 
corresponding vector u we have 

k 

Ui = u(xi) ~ p u (xi)u u for all i El with Xi E u. 

v=l 


Introducing the subset 


Cj := {i E 1 : Xi E ui} 


and the matrix V E M u,xfc with 

Vi v := Pu(xi) 
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for all i E Cj, u E [1 : k] 


we can write the approximation result in the short form 


This approximation is only valid for indices in the subset u C X. In order to obtain an 
approximation for the entire vector u, we split the index set X into m disjoint subsets 
u and approximate each subvector u|&.. The resulting approximation of u 
requires mk coefficients, and if the function u is smooth in large subdomains of 17, we 
can expect mk <C n. 

Having a representation of u by mk coefficients at our disposal, we are of course 
interested in performing algebraic operations with these representations, e.g., computing 
linear combinations of compressed vectors, evaluating inner products and norms, and 
multiplying compressed vectors by matrices. Under suitable assumptions, all of these 
operations can be carried out in 0(mk ) or 0(mk 2 ) operations. 

Compared to standard adaptive finite element methods, this approach has several 
advantages: 

• hierarchical vectors can be used with any matrix that can be approximated by an 
H 2 - matrix, e.g., matrices arising in the boundary element method or in the context 
of population dynamics, 

• refining and coarsening a hierarchical vector only involves adding and removing 
subtrees of a prescribed cluster tree, no special treatment of hanging nodes or 
differing polynomial degrees is required, 

• linear combinations and inner products of hierarchical vectors corresponding to 
completely different subdivisions of the index set can be computed efficiently, and 

• the approximation error in all of these operations can be computed exactly. 

The algorithm for the efficient multiplication of a hierarchical vector by an "H 2 -matrix 
requires certain precomputed auxiliary matrices, and in a simple implementation the 
setup of these matrices would require 0(nk 2 ) operations, making the method only at¬ 
tractive in situations where a large number of matrix-vector multiplications have to be 
carried out with the same 77 2 -matrix, e.g., for time-dependent problems like the heat 
or wave equation, or for the approximation of eigenvectors by a preconditioned inverse 
iteration. 

This disadvantage can be overcome if the differential or integral operator underlying 
the matrix is translation-invariant, since this property implies that matrices correspond¬ 
ing to translation-equivalent blocks are identical, and it can be expected that computing 
the auxiliary matrices only once for each equivalence class reduces the complexity to 
0(log(n)fc 2 ). Translation-invariance can even be exploited if it is available only in a 
subdomain. 
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2 Hierarchical vectors 

In order to be able to construct counterparts of local refinement and coarsening of 
meshes, we introduce a hierarchy of subsets of X. 

Definition 2.1 (Labeled tree) Let V be a finite set, let r € V, let S : V V(V) be 
a mapping from V into the power set of V, and let l : V —> M be a mapping from V 
into an arbitrary set M. 

T = (V, r , S, t) is called a (labeled) tree if for each v € V there is exactly one sequence 
vo,vi,... ,V£ 6 V such that 

vq = r, Vc = v, Vi £ S(vi- 1 ) for all i £ [1 : I]. 

In this case, r is called the root of T and denoted by root (T), and S{y) are called the 
sons of v & V and denoted by sons (T,v). 

For each v € V, l(v) £ M is called the label of v and denoted by v. 

Definition 2.2 (Cluster tree) Let Ti = ( V,r,S,t) be a labeled tree. We call it a 
cluster tree for the index set X if 

• r = X, i.e., if the root is labeled with X, 

• we have 

t = [J t' for all t G V with sons(f) / 0, 

t'Gsons(t) 

i.e., the label of a cluster is contained in the union of the labels of its sons, and 

• we have 

t\ 7 ^ t 2 =$■ t\ n t 2 = 0 for all t 6 V, ti, € sons(f), 

i.e., different sons of the same cluster are disjoint. 

If Ti is a cluster tree, we call the elements t G V clusters and use the short notation 
t. elf for t G V. 

Definition 2.3 (Leaves) Let Ti be a cluster tree. A cluster t e If is called a leaf of 
Ti if sons(7x, t) = 0 . 

The set of all leaves is denoted by 

Ci\={teTi : sons(7x, t) = 0}. 

Remark 2.4 (Leaf partition) A simple induction yields that the set 

{i : t £ Ti} 

is a disjoint partition of X, so we can describe a vector x £ uniquely by defining its 
restrictions 

x\j. £ M* for all t £ 
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Cluster trees for arbitrary index sets X can be constructed by fairly general algorithms 
usually based on recursively splitting a given subset into a number of disjoint subsets. If 
the indices correspond to geometric objects, e.g., points in a finite element mesh, these 
algorithms can ensure that clusters contain indices that are “geometrically close” to each 
other mm- 

In practical applications, it may be necessary to use different cluster trees to represent 
different vectors, e.g., to implement adaptive refinement towards moving singularities. 
In order to keep the corresponding algorithms simple and still be able to handle varying 
cluster trees, we use a reference tree Tx that remains fixed and choose subtrees T x to 
represent vectors x G M 1 . 

Definition 2.5 (Subtree) Let Tx be a cluster tree for X. A second cluster tree T x for 
Z is called a subtree ofTx if 

• root(%) = root(7x), 

• we have 


t G Tx for all t £ T x , and 

• we have 

sons(T x ,t) 7 ^ 0 => sons(7i, t) = sons(7x, t) for all t G T x , 

i.e., non-leaf clusters have the same sons in T x and Tx- 

If T x is a subtree of Tx, we denote its leaves by C x . 

The smallest subtree T x of Tx consists only of the root r = root(7x), with the root a 
leaf of T x - The largest subtree is Tx itself. 

Due to Remark |2.4[ the leaves of a subtree T x also define a disjoint partition of X, but 
this partition can be significantly coarser than the one corresponding to the leaves of Tx- 
Given a partition of X, we now turn our attention to systems of bases that can be 
used to represent the subvectors x\ f corresponding to the leaves t G T x of a cluster tree. 

In order to be able to “refine” a given hierarchical vector, i.e., to subdivide leaves of the 
corresponding subtree, we require these bases to be nested , i.e., if x\f can be represented 
in the basis corresponding to the cluster t G Tx, it has to be possible to represent x\p in 
the basis corresponding to its sons tl G sons(7x,t). 

Definition 2.6 (Cluster basis) Let k G N and let (Vt)teTx be a family of matrices 
such that Vt G Mf xk for all t G 7x. 

If there is a family {E t )t & j- X of matrices satisfying 

Vt\t'xk = Vt'E t ’ for all t G Tx, t' G sons(7x,f), (2.1) 

we call (V,E) a cluster basis of rank k for the cluster tree Tx- The matrices {E t )t^Ti 
are called transfer matrices. 

We simply write {Vt)t^_Ti as an abbreviation and introduce the transfer matrices if they 
are required. 
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procedure refine(t, var x); 
if sons(7x, t) T 0 then begin 
Add sons(7x, t) as new leaves to T x ; 
for t! G sons(7x, t) do 
x t > G- E t ix t 

end 


Figure 1: Refining a hierarchical vector 


Definition 2.7 (Hierarchical vector) LetT x be a subtree of lx, let (V,E) be a cluster 
basis for Tx- 

A vector x G is called a hierarchical vector corresponding to Tx and (V,E) if there 
is a family {xt)tsT x such that 

x\f = VtXt for all t G C x . (2-2) 

In this case, we call (x t )t&T x the hierarchical coefficients for x. 


In our setting, the leaves of the subtree T x play the role of the mesh used to repre¬ 
sent a function. Locally refining the mesh corresponds to choosing a leaf t G C x with 
sons(7x,f) T 0 and adding sons(7x,f) to the subtree. Due to (2.2) and ( |2.1| ), we have 

x\p = (VtXt)\p = V t \{, xk xt = Vt'Et'Xt for all t' G sons(7x,f), 

so the equation 

Xf := E t 'Xt for all t' G sons(7x, t) 


provides us with hierarchical coefficients for the refined tree. The procedure is summa¬ 
rized in Figure [lj 

We can use this procedure to add a hierarchical vector x with subtree T x to another 
hierarchical vector y with a different subtree T y , as long as T x and T y are subtrees of Tx'- 
assume that a cluster t G T x D T y is given. 

1. If t G C x and t G C y holds, we can simply add xt and yt- 

2. If t 0 C x and t 0 C y , we consider sons(7x, t) = sons(7^, t) = sons(T y , t) recursively. 


3. If t G C x and t $ £ y , we use (2.1) to obtain temporary coefficient vectors Xf = Et>xt 
that can be added recursively to y. 


4. If t 0 C x and t G C y , we apply refine to y and proceed as in case[2j 

Based on this approach, the update y G- y + ax can be performed by the recursive 
algorithm given in Figure [2] The procedure add.leaf is used to handle the cases [l] and 
[3j while the procedure add takes care of the cases [2] and |4} 

After the procedure add has been completed, T x is a subtree of T y and the sum of ax 
and y is represented exactly by the, possibly refined, hierarchical vector y. 
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procedure adcLleaf (t, zt, var y)\ 

if sons(7^,i) = 0 then 

yt<-yt + 

else 

for t' G sons(7^,t) do begin 

Zf EfZt ; 

add_leaf(t', Bf, y) 

end; 


procedure add(t, a, x, var y); 
if sons(7^,t) = 0 then 
add_leaf(i, otxt, y); 
else begin 

if sons(7^,t) = 0 then 
ref ine(t, y); 
for t' G sons(7^,i) do 
add(t / , a, x, y) 

end 


Figure 2: Adding a hierarchical vector x to a hierarchical vector y, refining the tree T y 
as required 


Remark 2.8 (Complexity) Since we only switch to the sons t' G sons(7x,t) of a 
cluster t if either sons(7^,i) or sons(T y ,t) are not empty, the procedure add given in 
Figurerequires 0{k 2 (ffT x + ffT y )) operations. 

We can follow a similar approach to compute inner products and norms of hierarchical 
vectors: let x, y G K 1 be hierarchical vectors with subtrees T x and Ty- In order to 
compute the inner product 

(x,y) = 

i£l 

we can split Z into subsets and consider sub-products 

(x,y)t ■= 

iet 

corresponding to clusters t G Tx- For t = root(7x) = root(7^) = root (7^), we obtain the 
full inner product (x,y). 

Let t G Tx- If t G L x and t G C y , we have 

x \t = V txt , y\t = V t y t 


and find 

(x,y)t = y^xw = ^2(V t xt)i{V t yt)i = x* t VfV t y t . (2.3) 

iet iet 

The products Ct '■= VfVt required to evaluate this expression are small k x k matrices 
that can be prepared using the recursion 


C t = 


V t *V t 


if sons(7x, t ) = 0, 


. Et'esons(t) E t> °t' E t> otherwise 


for all t G Tx 


(2.4) 


due to (2.1). With these matrices, the inner product (2.3) can be evaluated in 0(k 2 ) 
operations. 
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function dot_leaf(f, zt, y ) : real; 

if sons(7^,i) = 0 then 
return zfC t y t 
else begin 

7 0; 

for t' £ sons(7^,i) do begin 

Zt' <— EyZt'i 

7^—7 + dot_leaf (t',z t ',y)-, 

end; 

return 7 
end 


function dot(i, x, y ) : real; 

if sons(7^,t) = 0 then 

return dot_leaf(t, xt, y) 
else if sons(7^,i) = 0 then 
return dot_leaf(f, yt, x) 
else begin 

7^—0; 

for t' £ sons (T x , t) do 
7 G- 7 + dot(t',x,y); 

return 7 
end 


Figure 3: Compute the inner product of two hierarchical vectors x and y 


If t 0 C x and t 0 C y , we can use Definition |2.2| to get 

{ x -, y)t = X] 

t / Gsons(7x?^) 

i.e., we can compute the products for the sons t' G sons(7x,t) by recursion and add the 
results. 

If t G C x and t £ T y \ C yi we can again use Definition |2.2| and 

( x,y)t= Y y)t' = Y ( v t%t,y)t' 

t' Esons(7y ,t) t'€sons(7~ y ,t) 

= Y ( v t iE t iX t,y)t’ = Y (' v t'Zf,y)t 1 

t'Esonsff) t'£sons(t) 

with the auxiliary vectors zy = E t ixt ■ We can repeat this procedure recursively until we 
reach a leaf t G C y and then use C* as before. 

If t G T x \ E x and t £ C y , we can use the same approach with auxiliary vectors 
Z? = E t iy t . 

The resulting recursive algorithm is summarized in Figure [ij Due to ||x|| = y/ (x, x), 
we can also use this function to compute the norm of a hierarchical vector. 

Remark 2.9 (Complexity) Preparing Ct for all t £ Tx requires 0(k 2 ffZ) operations 
for the leaves and 0(k 3 jfTi) operations for the non-leaf clusters & Section 5.3]. 

Once these matrices have been prepared, the procedure dot given in Figure^ requires 
0(k 2 (ffT x + ffT y )) operations. 

3 Coarsening 

Adding two hierarchical vectors x and y using the procedure add given in Figure [2] will 
yield a new vector with a refined cluster tree that contains both T x and T y as subtrees. 


(2.1) to obtain 
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This tree may not be optimal, as can be seen by considering the extreme example of 
adding x and —x and obtaining the zero vector that can obviously be expressed by the 
minimal subtree of Tx- 

In order to keep the computational complexity as low as possible, we introduce an 
algorithm that does the opposite of refining a subtree, i.e., coarsening the tree. Where 
the procedure refine given in Figure [l] splits a leaf into sons, the new coarsen procedure 
merges sons into a new leaf. 

If we would use this procedure only in situations where it does not change the vector 
at all, it would be of very limited use. It makes more sense to consider situations where 
coarsening yields a reasonably good approximation of the original vector. 

In order to devise a reliable algorithm, we have to investigate the approximation 
errors introduced by hierarchical vectors. We are interested in nothing less but the 
best approximation of a given vector, and this best approximation with respect to the 
Euclidean norm is given by an orthogonal projection. These projections are immediately 
available to us if we have an orthonormal basis at our disposal. 

Definition 3.1 (Isometric cluster basis) A cluster basis ( Qt)teTx ca ^ e d isometric 
if we have 


QtQt = I 


for all t £ Tx- 


(3.1) 


There is an efficient algorithm that can turn any cluster basis into an isometric cluster 
basis without any change in its approximation properties [3, Section 5.4], so requiring a 
cluster basis to be isometric is not a significant restriction. 

For an isometric basis, a simple computation yields 

||x — Qty\\ 2 = \\x — QtQ* t x\\ 2 + ||y — Q* t x\\ 2 for all t £ 7j, x £ M*, y £ M fc , (3.2) 


and we conclude that QtQ^x is the best approximation of x in the range of Qt- 

Just as refine splits a given leaf cluster t into its sons t! £ sons(7x, t ), we are looking 
for an algorithm that merges the sons t' £ sons(7x, t ) into the father t. We assume that 
t £ T x is given in such a way that all of its sons t! £ sons(7^, t) are leaves of T x ■ To keep 
the presentation simple, we consider only the case of a binary tree, i.e., ff sons(71J;, t) = 2 
and sons(7i,f) = {ii, ^ 2 }- Since the sons are assumed to be leaves of T x , Definition 2.7 
yields 


X\f = 


= I x, ti ) = 


X 


*2 


Qt\x 1 1 
Qt2^t2 


We want t to become a leaf, so we have to find a coefficient vector x t with x\i 
Due to (3.2), the best choice is given by the orthogonal projection, i.e., 


Qtxt- 


x t := Q*x | 


(3.3) 


Computing xt by this equation would be very inefficient, since it would require us to first 
construct the entire vector x\ £ and then approximate it again. In order to reduce the 
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procedure coarsen(t, var x)\ 
if t' G C x for all t' G sons(7^,t) then begin 
x t G- 0; 

for t! G sons(7x, t) do begin 

x t ^x t + F*,x t r, 

Remove t' from T x 

end 

end 


Figure 4: Coarsening a hierarchical vector 


number of operations, we rely on the nested structure (2.1) of the cluster basis: denoting 
the transfer matrices for (Qt)tsT x by {F t )teTxi we have 

Qt = 

and find 

x t = Q* t x\ i ={F* 1 Q* tl F t * 2 Q* t2 ) 


Q tl F tl 

Qt 2 F t2 


Qt, xt, 

Qt2 X t2 


_ ( 77 * 77 * \(Q 

^ h t2 > \QlQtJ-t 


QuQuf a \ _ 

t2'° iT ' 2 - " c 2 


= G<* K) (G) = 


Using this equation, we can compute the optimal xt given by (3.3) efficiently. The 
coarsening procedure is summarized in Figure |4j 

Remark 3.2 (Complexity) If there is a constant C sn such that ^ sons(7x, f) < C sn 
holds for all t G Tx, the procedures refine and coarsen require only 0(k 2 ) operations. 

In order to obtain an adaptive algorithm, we have to be able to control the error 
introduced by coarsening steps. We define 


Qt 


and find that (2.1) takes the form 

Qt = 


The error can be written as 

x\t ~ QtQt x \i = 


Qt,xt, 

Qt2 X t2 

Qt i 


Qu 


Ft, 

Ft2 


Qt, 

Qt, 

Xt, 

Xt 2 


R( 2fc ) 


xk 


Qtn 


Qt- 


QtQt 


Qt2 

- QtQ* 


Qt 


Q 


t2 


Qt, xt, 

Qt-2 Xf.2 


Xt, 
Xt 2 
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Since Qt t and Qt 2 are isometric matrices, they leave the Euclidean norm unchanged and 
we conclude 


\x\f - Q t Q*x\A\ = 


Xtl )-QtQ* t ( Xt] 

Xt 2 J \ X t2 


In theory, this equation allows us to evaluate the error explicitly in 0(k 2 ) operations. In 
practice, however, we are subtracting two vectors with, hopefully, very similar entries, 
so we have to expect rounding errors to influence the result significantly. 

We can avoid this problem by introducing a suitable auxiliary matrix: since Qt, Qt 1 
and Qt 2 are isometric, we have 


I = QtQt 



QtQt, 


so the matrix Qt is also isometric. This means that we can extend it to an orthonormal 
basis, i.e., we can find an isometric matrix T) G R( 2fc ) xfc such that 

(Qt g M (2fc)x(2fc) (3.4) 

is orthogonal and square, e.g., by computing the Householder factorization of Qt and 
accumulating the elementary reflections. This implies 


I = 




QtQt + PtPi 


1 > 


I - QtQt = P t P t 


t > 


and we conclude 



= PtP? 



Since Pt is isometric, the first factor on the right-hand side does not influence the norm 
and we have proven the following result: 


Theorem 3.3 (Coarsening error) 


The matrices ( Pt)teTx\C x defined, by (3.4) satisfy 


x\t QtQt,x\t 



for all t G Tx \ C-z, x\f 


( QttXtA 

yQt 2 Xt 2 J 


(3.5) 


Remark 3.4 (Implementati on a nd complexity) To obtain a fast and robust algo¬ 
rithm, we can construct Pt in (3-4) by applying k Householder reflections P±, ■ ■ ■, Pk to 
triangularize Qt, i.e., to obtain PkPk- 1 • • • P\Qt = R with an upper triangular matrix 
R G M 2fcxfc . Now Pt consists of the last k columns of P( Pf . . . Pf. If we compute 


uu-,...u(5), 

we find the error vector in the last k rows of the result. 


(3.6) 
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By choosing the signs in the construction of the Householder vectors correctly, it is 
possible to ensure that the first k columns of PfPf ... Pf coincide with Q t . In this case, 
we can find the coefficient vector xt = Q* x \t the first k components of 
any additional work. 

If we assume that constructing a Householder vector of dimension n E N and ap¬ 
plying the corresponding reflection takes not more than C qr n operations, finding the 
k Householder reflections for triangularizing the (2k) x k-matrix Qt takes not more 
than C qr (2k)k 2 = 2C qr k 3 operations and applying P t * to a vector takes not more than 
C qr (2k)k = 2 C qr k 2 . 

In the general case, i.e., if we do not assume Tx to be a binary tree, we get 

C qr k 3 ff sons(7x; t) and C qr k 2 ffsons(Tx,t) 

operations, respectively, and we can conclude that preparing the matrices Pt for the entire 
cluster tree takes not more than C qr k 3 ffTx operations, while coarsening a hierarchical 
vector with a subtree T x takes not more than C qr k 2 ffT x operations. 

4 % 2 -matrices 

We have developed algorithms for adding hierarchical vectors, for computing norms and 
inner products, and for refining and coarsening the corresponding subtrees. 

Now we consider the multiplication of a hierarchical vector by a matrix. Let Tx and 
Tj be cluster trees for the index sets T and J. We are looking for an algorithm that 
takes a matrix A E M 2:x ^ and a hierarchical vector x 6 corresponding to a subtree 
T x of Tj and computes a new hierarchical vector y E such that y = Ax, and we 
would like this computation to take only 0(k 2 ffT x ) operations. 

This is obviously not possible for general matrices A, so we have to restrict our atten¬ 
tion to a suitable subset of matrices. H 2 -matrices HH0E1 have the necessary properties. 

Just like a hierarchical vector is based on a cluster tree Tx that describes a hierarchical 
splitting of the index set T, an L/ 2 -matrix is based on a block tree that describes a 
hierarchical splitting of X x J. 

Definition 4.1 (Block tree) Let Txxj = (V,r,S,i) b e a labeled tree. We call it a 
block tree for the cluster trees Tx and Tj if 

• for each b E V there are t E Tx and s E Tj such that b = ( t, s ) and b = t x s, 

• root(Txxj) = (root(7j),root(7»), 

• if b = (t, s) € V is not a leaf, we have sons(lxxj, b) = sons(7x, t) x sons(Tj, s ). 

If Txxj is a block tree for Tx and Tj, we call the elements b E V blocks and use the 
short notation b E Txxj for b E V. We denote its leaves by C-xxj- 


3.6) without 
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It is easy to see that a block tree Txxj for Tx and Tj is a special cluster tree for the 
index set I x J, and Remark |2.4| yields that 

{tx s : b = (t,s) E Cxxj} 

is a disjoint partition of I x J, so we can describe a matrix A E uniquely by 

defining its restrictions A\f y - for all b = (t,s) E C-Xxj- An 'R 2 -matrix represents these 
submatrices by a three-term factorization using cluster bases. 

Definition 4.2 (L^-matrix) LetTxxj t> e a block tree for Tx andTj. Let (Vt)teTx an d 
(W s )s£Tj t>e cluster bases for Tx and Tj, respectively. 

We call a matrix A E M- rx ' 2 an TL 2 - matrix with respect to Txxj, (VtfteTz an d {W s )s&Tj 
if for each b = (t, s ) E C-Xxj we can find S b E M. kxk such that 

A\ txS = v t s b w;. (4.1) 

In this case, the matrices S b are called coupling matrices, the cluster basis {Vt)teTx ts 
called the row cluster basis, and the cluster basis (W s ) s ^p 7 is called the column cluster 
basis. 


Remark 4.3 (Special case) A more general definition of TL 2 -matrices is commonly 
found in the literature mew Our definition is equivalent if 

• all leaves of the cluster trees Tx and Tj appear on the same level, and 


• for all leaves t E Lx and s E Cj, the matrices V) and W s have full rank. 

The first assumption allows us to avoid special cases in the construction of the block 
tree, the second assumption allows us to express all leaf blocks in the form (f.l), even if 
they do not satisfy the admissibility conditions that are usually employed to determine 
approximability. 

We make both assumptions only to keep the presentation simple, all algorithms and 
theoretical arguments in the following can be extended to the general case by handling a 
moderate number of special cases. 


5 Matrix-vector multiplication and induced bases 

Let x E M" 7 be a hierarchical vector for a subtree T x of Tj and an isometric cluster basis 

( Qt)teT x ■ 

Let A E M 2:x ' 2 be an L/ 2 -matrix for the block tree Txxj, the row cluster basis {Vt)teTx 
and the column cluster basis ( W s ) s ^'x J . We assume that both bases have rank La, while 
we keep using k to denote the rank used by the hierarchical vectors. 

We want to compute the matrix-vector product y := Ax E If 1 efficiently, i.e., the 
number of operations should be in 0((Ica + k) 2 T x )• 

If x was a vector without hierarchical structure, we could split A recursively into 
submatrices according to the block tree and evaluate the contributions of the leaves to 


13 



the result. Since x is a hierarchical vector, we have to modify the procedure and stop 
splitting as soon as we reach a leaf of the subtree T x describing the structure of x: let 
b = (t, s ) E 7xxj with s E T x . We consider the problem of evaluating A\f xS x\$. 

1. If sons(7xxj’> b) = 0, we have A\f y .x\- S = VtS^W*x\g and can compute the product 
explicitly. 

2. If sons(7xx j, b) ^ 0 and sons(7^,s) / 0, consider all submatrices A\t' XS ' with 
b' = (t 1 , s') E sons(7xxj, b) by recursion. 

3. If sons(7x x > 7 ) b) 7 ^ 0 and sons(7^, s) = 0, we have no choice but to compute A\^ x ~x\^ 
directly. 

Case [ 2 ] is straightforward, we only have to ensure that the result y has a hierarchical 
structure that matches the recursion. This can be easily accomplished by using the 
procedure refine. 

Case [l] can be handled as for standard % 2 -matrices: we prepare auxiliary vectors 
x s = W*x\g for all s E T x in advance by a backward transformation , accumulate all 
contributions to a row cluster t E Tx in an auxiliary vector y t , and finally add Vtyt to 
the result using a forward transformation. 

Let us first consider the backward transformation. If s E T x is a leaf, we have x|j = 
Q s x s by definition and need to compute 

x s = w;q s x s . 


If we have the auxiliary matrices 

D s := W S *Q S E R kAXk for all s E Tj 


at our disposal, we can evaluate x s = D s x s in O(kAk) operations. In order to prepare 
these matrices, we can follow a similar approach as in (2.4): denoting the transfer 
matrices for the column basis {W s ) s& 'j- J by ( E\y,s)seTj and the transfer matrices for the 
vector basis (Qs)seTj- by (F s ) s& j- J , we can use (2.1) to get 


D, = 


for all s E Tj, (5.1) 


WfQ s if sons( 7 ) 7 , s) = 0, 

.£s'esons(7>,s) E w,s’Ds'F s ' otherwise 

and this allows us to compute all of these matrices in 0(kA(kA + k)kffTj) operations. 


If s E T x is not a leaf, we can again use (|2.1|) to find 
x, = w:x\& = 


s*\s- E W *\*s'xkX\s'= E E *W,s' W s 

s' Gsons(7^/ ,x) s'G sons(7^,cc) 

E E w,s'^s’ for ad s &T y , sons(7^, s) / 0. 
s' Esons(7^ ,x) 




The resulting algorithm is called the forward transformation and is given as the proce¬ 
dure forward in Figure [5} 


14 





procedure forward(s, x, var ( x s ) S £j x ); 
if sons(7^, s) = 0 then 
X s i D s x s 

else begin 

X s <r- 0; 

for s' G sons(7^,s) do begin 
forwards', x, ( x s )s&T x )\ 
x s <r- x s + E~w s ,x s > 

end 

end 


procedure backward(f, var (y t )teT v , v)\ 
if sons(7^, t) = 0 then 

yt<~m + Vt 

else 

for t' G sons(7^,t) do begin 
W W + E Vjt iy t ; 
backward(f', ( yt)t&T y , y) 

end 


Figure 5: Perform forward and backward transformations for hierarchical vectors 


If our recursive algorithm encounters a leaf b = (t, s ) G C-zxj, it looks up x s = W*x\$ 
among the vectors prepared by the forward transformation, multiplies it by «S&, and 
adds it to an auxiliary vector yt that collects all contributions to a row cluster t. As 
mentioned in the discussion of case [2j we assume that a subtree T y is created by the 
recursive procedure that ensures t G T y , so we only need the auxiliary vectors for these 
clusters. 

In a last step, we have to take care of these temporary results, i.e., we have to add 
Vtyt to the final result for all t G Ty If we represent the result y G as a hierarchical 
vector with the cluster basis (Vt)t£Tu we can handle leaves t G C y directly by adding y t 
to the corresponding coefficient vector y t . If t G T y \ C y is not a leaf, we can use (2.1) 
again to find 


(Vtyt)lf + V t 'y t > = Vt\p xk y t + V t >y t ' = V t >(E Vit 'y t + y t >) for all t’ G sons(7^, t), 

i.e., instead of adding Vtyt to y\f directly, we can also add Ey t /y t to y t > and handle the 
son clusters t' G sons(7^,i) by recursion. The resulting algorithm is called the backward 
transformation and is given as the procedure backward in Figure [5] 

While the cases [l] and [ 2 ] can be handled essentially as in the case of standard V. 2 - 
matrices, the case [3] requires special treatment: if we encounter a leaf s G C x and a block 
(t, s ) G Tixj that is not a leaf of the block tree, we cannot afford to subdivide s further, 
since we are aiming for an algorithm with only 0((k 2 + k\)ffT x ) operations. We have 
to find a way to add 

= A\f X gQ s x s 

to the subvector y of the result y. 

We can face this challenge by using induced cluster bases [3], Section 7.8]: instead of 
representing the result y in the row cluster basis (I 4 )te 7 y, we use a cluster basis that 
also contains the products A|£ x -Q s G W txk for all t G Tx with (t, s ) G Txxj \ Ezxj- 
To define the induced cluster basis, we introduce 


row (t) ■.= {s eTj : (t, s ) G Tixj \ Eixj} for all t G Tx- (5.2) 
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This set contains all column clusters s E Tj that appear in non-leaf blocks with the row 
cluster t. These are the blocks that have to be handled by the induced basis. We let 
fit : = #row - (f) and fix s^i,... , st,p t E Tj such that 

row ~{t) = , s tt p t } for all t E 7j. 

Definition 5.1 (Induced cluster basis) Let A G be an H 2 -matrix for Txxj 

with row cluster basis (Vt)teTx> an d Jet ( Q s ) s &Tj be another cluster basis. 

We define it '■= b'A + kfit and 

U t := (V t A\ txSt i Q StA ... A\ {x - t0 Q St ^ e R ixit for all t E Tx (5.3) 

and call {Ufijt&Ti bhe induced cluster basis corresponding to the Ti 2 -matrix A and the 
input cluster basis (Q s ) s& 7 ^.. 


Remark 5.2 (Nested) Calling the induced cluster basis {UtjteTx a c ^ us t er basis is jus¬ 


tified, since it is nested & Lemma 7.22], i.e., it satisfies ( 2.1}) for suitable transfer 
matrices Ejj -p E 

If we use the induced cluster basis to represent the result y, case [3] can be handled by 
simply adding x s to the appropriate portion of the corresponding coefficient vector. To 
keep the notation simple, we denote these coefficient vectors by 



( yt ^ 



1 yt ^ 

yu,t = 

yt,s tl 1 

E R £t , 

yu,t = 

yt,s t , 1 





\yt,st,p t J 


Now the cases [l] and [3] can be handled almost in the same way. Figure [ 6 ] summarizes the 
recursive procedure coupling that handles all three cases. 

To complete our algorithm for the matrix-vector multiplication, we require the back¬ 
ward transformation for the induced cluster basis. In order to generalize the algorithm 
backward given in Figure [5j we require an understanding of how the transfer matrices 
for the induced cluster basis act on vectors. 

In order to keep the presentation simple, we again assume that the cluster tree Tj is 
binary and that for each cluster s E Tj \ Cj its sons are given by sons(7j r , s) = {si, S 2 }. 


The first block of (5.3) is straightforward: for t E Tx and t! E sons(7x,t), we have 

V t\t'xk = V t ,E t' 


by (2.1). The following blocks are a little more involved. Let s E row (t). The definition 


(5.2) yields b := (t, s) E 7xxj\£-Xxj, and with Definition 4.1 we obtain sons(7 Xxj, b) = 


sons(f) x sons(s). Restricting to t! and using ( |2.l[ ) gives us 

(A\ixsQs)\t'xk = A\t>x$Qs = (^If'xsi ^1 £'xJ 2 ) f 

= (^It'XSlQsi A\tl X s 2 Qs 2 ) 


Q 

Qs 2 E S2 


Sl E S! 


«1 


s 2 
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procedure coupling^ = (t,s), x, ( x s ) seTx , var y, {y t )t&T y )\ 
if sons(7xx j, b) = 0 then 
Vt t- yt + S b x s 
else if sons(7i,s) = 0 then 
yt,s «- yt,s + x s 
else begin 

if sons(7^,. t) = 0 then begin 
ref ine(t, y); 

for t' E sons(7x, t) do yu.p •(— 0 

end; 

for t' G sons(7x, t), s' G sons(7y,.s) do 

coupling(6 / = (t',s'), x , (.T s ) sera: , y, ( yt)t&T v ) 

end 


Figure 6: Evaluating all couplings between row and column clusters 


Let s' G sons(7^7, s). If b' := (t', s') is a leaf of Ixxj > we have 

A\i' xs'Q s ' ^s'yt, s — Vt 1 s 1 F s 'yt,s — Vj/ Sb' d s i F s 'yt,si 


and we can express the product by the first block in (5.3). 


On the other hand, if b' := ( t',s') is not a leaf of Tzxji we have s' G row ~(t') 
by definition and can express the product A\y xS ,Q s iF s i using one of the other blocks in 
(5.3). The resulting backward transformation for the induced cluster basis is summarized 


as the procedure induced.backward in Figure [7j 

Combining the forward transformation given in Figure [5} the coupling step in Figure [6j 
and the backward transformation for the induced basis in Figure [7] yields the matrix- 
vector multiplication algorithm given in Figure [8} 

Our goal is now to prove that this algorithm requires not more than 0((k\ + k 2 )#Tx) 
operations. In order to establish a connection between the number of clusters and the 
number of blocks, we require a standard assumption: the block tree has to be sparse 

BH EDI. 

Definition 5.3 (Sparse) LetTzxj be a block tree for Tz andTj■ We define 


row(Tzxjfi) ■= {s eTj : {t,s)eTzxj} 
coVJzxj, s) := {t G Tz : {t, s) G Tzxj} 

Let C sp G N. A block tree Tzxj is called C sp -sparse if 

#row(7z x y,t) < C sp , # col(7x x y, s) < C sp 


for all t G Tz, 
for all s G Tj. 


for all t G Tz, s G Tj. 


Lemma 5.4 (Forward transformation) The forward transformation forward given 
in Figure [5| requires 0(kA(kA + k)ffT x ) operations. 
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procedure induced_backward(t, var ( yu,t)teT y i y); 
if sons(7^,i) = 0 then 

yu,t <- yu,t + yu,t 

else 

for t' G sons(7^,t) do begin 

Vt' G- y t > + Eyyyt-, 

for s G row _ (f), s' G sons(7j-, s) do begin 
b' G- (t', s'); 

if sons(7x x ji b') = 0 then 
yt' G- y t / + SvD s iF s iy t ,s 

else 

yt',s' yt',s' + Fs'Vt,s 

end; 

induced_backward(t', ( yu,t)teT v , y) 

end 


Figure 7: Backward transformation for the induced cluster basis 


procedure eval(M, x, var y); 

Lx G- root(7x); Lj G- root(7»; 

Let 7^ be the minimal subtree of 7x containing only rx; 

yu,n G - 0; y Un G- 0; 

forward(r > 7, x, (x s ) ser J; 

coupling^ = (r x ,rj), x, (x s ) seTx , y, (yt)terj; 

induced_backward(?’x, (: yu,t)teT y , v) 


Figure 8: Matrix-vector multiplication, result represented in the induced cluster basis 
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Proof. We first note that the function forward is only called for clusters s £ T x - 
If s is a leaf, the multiplication by D s requires 2 kAk operations. 

If s is not a leaf, the function performs a multiplication by E s t for each of the sons 
s' E sons(7^,s). This takes 2k\ operations. 

Since each cluster has at most one father, not more than 2kA{kA + k) operations are 
required for each cluster. ■ 

Lemma 5.5 (Coupling step) Let Tixj be C sp -sparse. 

The coupling step coupling given in Figure^ requires 0(C sp kA(kA + k)ffT x ) opera¬ 
tions. 

If T y is the minimal subtree ofTz containing only the root prior to calling coupling, 
it will satisfy 

#Ty < CspffTx 

after completion of the algorithm. 

Proof. The procedure coupling is only called recursively if sons(Txxj, b) / 0 and 
sons(7^, s) / 0 hold. In this case, we have sons(7x x j, b) = sons(7x, t) x sons(7j-, s) and 
sons(7i,s) = sons(7j-,s) by definition, and therefore ( t',s') E Tixj and s' E Ty for all 
t' E sons(7x) and s' E sons(7j). Since coupling is first called with t = root(7x) and 
s = root(7j r ), we can guarantee that for each call to coupling we have b = (t, s ) E 7 xxj 
and s E %. 

This implies t E col(Tx x j, s), and since the block tree Txxj is Cgp-sparse, we have 
#col(7xxj',s) < C sp . 

In each call to coupling, we perform either 2 k\ operations to multiply x s by Sb, or 
k < 2kAk operations to add x s to yt yS - 

We conclude that the total number of arithmetic operations is bounded by 

£ £ 2kA(kA + k) < 'y ^ 2C sp kA(kA + k) — 2C sp kA(kA + k)ffTx. 

s€zTx iGcol(7xx JI s ) sdzTx 

The procedure refine is only called to extend the tree T y if sons(7 xxj,b) 0 and 
sons(7^, s) 7 ^ 0. We have already seen that in this case we have {t',s') E Txxj and 
s' E T x for all t' E sons(7x, t) and s' E sons(7j-, s). In particular, for each t' E sons(7x, t) 
added by refine, we can find a cluster s' E sons(Tr, s) C 7^ with t' E col(7x x j, s'). 

Since we start the procedure with a minimal subtree Ty containing only the root of 
7i, we can conclude that t E T y implies t E col(7x x j, s) for an s E %. The Cgp-sparsity 
of Txxj yields 

Ty C [J co\(Txxj,s), #T y < ^2 # col (7xxj r ,s) < C sp ffT x . 

seT x seT x 
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Lemma 5.6 (Backward transformation) Let Tzxj be C sp -sparse. 
The ranks of the induced cluster basis are bounded by 


it <kA + C sp k for all t £ 7x, 

and the backward transformation induced-backward for it given in Figure [?] requires 
0(C sp (k A + k 2 )ffT y ) operations. 


Proof. We first note that we have 

#row"(7 ixj,t) < #row(Tzxj,t) < C sp 


for all t £ If- 


By definition (5.3), this implies 

it k-A + C sp k 


for all t £ Tz- 


Let us now consider the number of operations required by inducedJbackward for a 
cluster t £ T y . 

If t is a leaf, yu.t is added to yjj,t, and due to (5.3), this requires 

kj{ + kff row ~(t) < kA + kff row(t) 


operations. 

If t is not a leaf, the multiplication of yt with Eyt' takes 2 k A operations, and for all 
s £ row _ (t) and s' £ sons(Tj, s) we perform either one multiplication with F s < or three 
multiplications with Sy, D s i, and F s >, so not more than 2 k A + 2kAk + 2 k 2 < 3 (k A + k 2 ) 
operations are required. Due to t' £ sons(7^,i), s £ row ~(t) and s' £ sons(7y,s), we 
have ( t',s') £ Tzxj and therefore s' £ row(7 zxj,t'). Since each s' has only one father 
s, we conclude that not more than 

a + ^2 3(k\ + k 2 ) = 2 k A + ^ 3(fc^ + k 2 ) 

s£ro'w~(TzxjJ) s ' Gsons(7j- ,s) s'erow(7zx jT) 

= 2 k A + 3 (k A + k 2 )ffiow(t') 

operations are required. The total number of operations is now bounded by 

Y, 2 k A + 3 (k A + k 2 )ffrow(t) < ^ 2 k A + 3 (k A + k 2 )C sp 
teTy teT y 

< (2 + 3C sp )(k A + k 2 )ffT~y. 


Theorem 5.7 (Complexity, matrix-vector multiplication) Let the blocktreelzxj 
be C sp -sparse. 

The algorithm eval given in Figure ^ requires 0(C 2 p (k A + k 2 )ffT x ) operations. 
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Proof. Lemma [5.4| yields that the forward transformation requires 0(kA{kA + k)ffT x ) C 
0((k A + k 2 )ffTx) operations. 

yields that the coupling step requires O(C sp kA{kA + k)ffT x ) C 0(C sp (k\ + 


Lemma 


5.5 


k 2 )#%) operations and that T y will subsequently satisfy ffT y < C sp ffT x . 

Lemma 5.6 yields that the backward transformation requires 0(C sp {k\ + k 2 )ffT y ) 
operations, and due to the estimate #T y < C sp ffT x , the number of operations is also in 

o{c 2 p {k\ + k 2 )#r x ). 

Setting up the initial vector y requires no arithmetic operations. ■ 


6 Adaptive conversion 


We have seen that the matrix-vector multiplication algorithm presented in Figures [5j 
©0 and [8] computes the exact result of the matrix vector multiplication y = Ax for a 
hierarchical vector x in 0(C 2 p {k\ + k 2 )ffT x ) operations. 

Unfortunately, the algorithm yields a result that does not use a cluster basis of our 
choosing, but the somewhat artificial induced cluster basis. This is particularly undesi¬ 
rable since the rank of the induced cluster basis can become quite large. 

We address this problem by developing an algorithm that approximates a given hier¬ 
archical vector by another hierarchical vector using a different basis. 

Let x E be a hierarchical vector corresponding to a subtree T x and a cluster basis 
(Vt)teT x - Our goal is to represent it by a hierarchical vector y E corresponding to 
a subtree T y and a second cluster basis ( Qt)teT x ■ We have already seen that isometric 
cluster bases are useful for purposes like this, so we assume that ( Qt)t&T x is isometric. 

Consider a leaf t E C x . We have x\ f = VfXt, and we could simply employ the orthogonal 
projection corresponding to Qt to obtain the approximation 


x\ t = V t x t 


QtQtVtx t , 


but we are faced with the question if this approximation is sufficiently accurate. 

Assume that we know that it is not. In this case, we split t into its sons and try to 
approximate the subvectors x\y by the projections Qt'Q* t tX\p corresponding to the sons 
t' E sons(7x,t). If the resulting errors are still too large, we keep splitting recursively 


until we are satisfied. Since we have (2.1) at our disposal, this procedure can be carried 


out efficiently and provides us with the required hierarchical vector y and subtree T y . 

Although T y is now guaranteed to be fine enough to satisfy our accuracy requirements, 
it may be too fine. Fortunately, we have the procedure coarsen (cf. Figure [4]) at our 
disposal to reduce the cluster tree again while ensuring that the error stays below a given 
bound. 

The resulting algorithm convert is given in Figure [9j but it is still incomplete: how 
can the algorithm judge whether an approximation error is “small enough”? 

For the second case, i.e., the coarsening of the vector y given in the isometric basis 
(Qt)teTx, we have already solved this problem: we can construct the auxiliary matrices 


(P t )t£Tz introduced in (3.4) and use (3.5) to compute the error norm explicitly and 
robustly. 
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procedure convert_leaf (t, Xt, var y); 

if || VtXt — QtQ*VtXt || small enough then 

vt «- Q*tV t x t 

else begin 
ref ine(t, y); 

for t' G sons(7x, t) do begin 
x t < -Bt'Xt; 

convert_leaf(t / , icy, y) 

end 

end 

Figure 9: Conversion from one cluster basis 
subtree 


procedure convert (t, x, var y); 
if sons(7^,t) = 0 then 
convert_leaf (t, xt, y) 
end else begin 

for t' G sons(7i,t) do 
convert (t', x, y); 

if ||y| t - — QtQJ'yljH small enough then 
coarsen(f, y) 

end 

to another with adaptively constructed 


We still have to consider the first case: a leaf t G T x is given and we have to compute 
the projection error 

\\V t x t -Q t Q* t V t x t \\. 

We can solve this problem by generalizing the approach used for the coarsening error: 
we construct k x k matrices {Zt)t^Ti an d isometric matrices ( Pt)teTx such that 


Vt - QtQ* t Vt = PtZt 

for all t G Tx, 

(6.1a) 

PtPt = I 

for all t G Tx, 

(6.1b) 

PtQt = o 

for all t G Tx- 

(6.1c) 


The first property (6.1a) states that PtZ t is a representation of the projection error, the 
second property (6.1b) simply restates that Pt is isometric. The third property (6.1c) is 
used in the construction of the families {Z t )t&Ti an d (Pt)teTx- 
Combining (6.1a) and (6.1b|) yields 


II V t x t - QtQtV t x t || 2 = \\P t Z t xth = \\ z txth 


for all t G Tx, Xt G 


and since Zt is small, we can evaluate the right-hand side efficiently. 

We construct the families (Z t )teT x an d {Pt)teT x by recursion. Let t G Tx- 


Leaf cluster. If t is a leaf of Tx, we have Vj and Qt at our disposal. We can use a 
Householder factorization to extend Qt to an isometric matrix, i.e., to find an isometric 
matrix Pt such that 

Qt ■= (Qt Pt) e R ixi 

is orthogonal, i.e., quadratic and isometric. This means 

^ = QtQ* t Vt = (< Q t Pt) V t = QtQ* t Vt + P t P;V t , 
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which is equivalent to 


Vt - QtQjVt = P t p;v t . 

If k -C #t, the matrix Pf Vt will have a large number of rows. Since it has only k 
columns, we use a thin Householder factorization 

P* t Vt = PtZ t , (6.2) 

where Zt is a k x k upper triangular matrix and Pt is isometric. We let 

Pt ■= PtPt 


and observe 


Vt = QtQt.Vt + P t P*V t = Q t Q*V t + P t P t Z t = QtQtVt + P t Z t , 

p;p = P;p t *PtPt = p t *p t = /, 
p;qi = p t *p t *Q t = p t * o = o. 


Non-leaf cluster. If t is not a leaf of 7x, we cannot use Vt and Qt, since they are only 
given implicitly via the corresponding transfer matrices. For the sake of simplicity, we 
assume #sons(f) = 2 and sons(t) = We have 


V t 


(V tl E tl \ 
\Vt 2 E t J ’ 


Qt 


(QuFtA 

\Qt 2 F t J 




Qt, 


where we let 



Assuming that Pt i, Z tl , P t2 and Zt 2 have already been computed by recursion, we find 


Vt - QtQ*V t = 


+ 


+ 


fVt.EtA 

[Vt 2 E t2 J 

(V tl E tl \ 

\y t2 Etj 



QtJ Q*® 
Qt) ^ 


(Qt! \ (VttEtA 
V QtJ \Vt 2 E t2 ) 

(Q* tl V tl E tl \ 

\Ql 2 Vt 2 E t J 


(V tl E tl \ _ (Q tl Q* tl V tl E tl \ 
V V t2 EtJ \Qt 2 Qt 2 Vt 2 E t J 


(Q tl Q* tl V tl E tl \ _ (Q tl 
\Qt 2 Ql 2 Vt 2 E t J V 
((V tl - Q tl Qt 1 V tl )E tl \ 
{(Vt a -Q ta Q! a V ta )EtJ 

(Qt, \ [ (Qu v u E u 
V QtJ [\Qt 2 Vt 2 E t2 



(QlAuEtJ 

\Qt 2 Vt 2 E t2 ) 


- QtQt 


(Q* tl V tl E tl \ 
Wt 2 Vt 2 EtJ_ ' 
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For the first term, our assumption yields 


(V tl -Q tl Q* tl V tl )E tl \ (P tl Z tl E tl 

Pt 2 Zt 2 E t2 


SV t2 -Q t2 Q* t2 V t2 )E t2 

For the second term, we introduce 


(6.3) 


v t •= (^ E A 

* \Ql 2 V t2 E t2 ) 


and obtain 


As before, we extend Qt to a square isometric matrix 

Qt ■■= (Qt P 


and obtain 


v t - QtQtV t = p t p*v t . 


(6.4) 


Combining the equations (|6.3|) and (|6.4|) yields 

'Qt! 


V t - QtQ*Vt = 


Pt i Ztt Et! 
Pt 2 Zt 2 E t 2 


+ 


PtP*Vt 



Pt! Qt! 
p t 2 Qt 2 


where it is important to keep in mind that Pt has 2k rows, so the dimensions of the first 
and second factor match. 

To obtain the final result, we compute a thin Householder factorization 


' E tl 

Z tj E t2 | = P t Z t 
P*V 


(6.5) 


with a k X k upper triangular matrix Zt and let 


Pt := 


Pt! Qt! 

Pt 2 Qt 



Pt. 


Since it is a product of three isometric matrices, Pt is also isometric, so (6.1b) holds. 
By our construction, we have 


V t - QtQtV = 


Pt! Qt! 

Pt 2 Qt 2 



PtZ t = PtZt, 


P 
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so (6.1a) holds, too. 

Since (6.1c) holds for the sons ti and t r 2 , we have 

( P l \ 


'I 


PtQt = Pi* 


= p: 


i 


p 


Q 


t2 


11 


p; 


QXJ 
fPuQt,Fu\ 
P t * 2 Qt 2 Ft2 

Ft, 

V F t2 ) 


Qu Ft, 
Qt 2 Ft 2 


= p* 



We have constructed Pt by extending Q t to an orthogonal basis, so we have PfQt = 0 
and conclude 


'I 


PtQt = P t * 


i 



o = p;[ o = o, 


i.e., (6.1c) holds also for t. The construction is complete, and we have proven the 
following result: 


Theorem 6.1 (Projection error) The matrices [Zt)t&Ti defined by (6.2) and (6.5), 
respectively, satisfy 


\\v t x t - QtQ* t V t x t \\ = \\Z t x t \\ for all t £ 7x, x t e M fc . 


The matrices Pt appearing in our construction are only 

we assume 


Remark 6.2 (Complexity) 

required for the proof, but not for the practical algorithm. As in Remark 3.f 
that there is a constant C qr such that a Householder vector of dimension n £ N can be 
constructed and applied in not more than C qr n operations. 

For a leaf cluster t e Ti, we first apply k Householder reflections to construct Pt. 
This takes not more than C qr (fft)k 2 operations. Applying the reflections to compute the 
matrix P*Vt takes C qr (fft)k 2 operations. This matrix has (fit) — k rows and k columns, 
so computing its Householder factorization requires C qr (fit — k)k 2 operations. We obtain 
a total of 

3 C qr (fii)k 2 


operations for leaf clusters. 

For a non-leaf cluster t 6 lx, the construction of Pt as a product of k Householder 
reflections takes C qr (2k)k 2 operations. Setting up the matrix 


(Zt,E tl \ 
I Zt, 2 E t , 2 I 

V PtVt ) 
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takes 2k 3 operations for both the first and second block and C qr (2k)k 2 for the third. The 
matrix has not more than 4 k rows and k columns, so the final Householder factorization 
takes not more than C qr (4:k)k 2 operations. We obtain a total of 

C qr (2k + Ak)k 2 + 4 k 3 = ( 6C qr + 4)k 3 


operations for non-leaf clusters. In the general case, i.e., if 7x is not necessarily a binary 
tree, we obtain the bound 

(SC qr + 2)k 3 #sons(Ti,t). 

Adding the operations for all clusters and taking Remark \2-4\ into account, we conclude 
that preparing ( Z t ) te j- X for all clusters takes not more than 

3C qr k 2 #l + (3 C qr + 2)k 3 #Ti 


operations. Under the assumptions of Remark f.3, the first term vanishes, since the 
matrices Qt have full rank in leaves, therefore no approximation error can occur. In this 
case, the computational work for preparing all matrices {Zt)t^Ti\C x 0{k 3 jfTz)- 


Remark 6.3 (Application to induced basis) If we apply the conversion algorithm 
to obtain an efficient procedure for the matrix-vector multiplication, the cluster basis V 
is the induced basis, and the rank of the induced basis can be bounded by k :a + C sp k < 
C S p{kA + k). For large clusters, we get a complexity of 0(C 3 p (kA + k) 3 ). Fortunately, for 
the majority of small clusters, we have fft < kA + C sp k, so the matrix Z t is smaller than 
our worst-case estimate suggests and the entire algorithm is still reasonably efficient. 


7 Numerical experiments 

We consider the application of hierarchical vectors to the task of computing eigenvectors 
of a matrix corresponding to a partial differential equation. In our case, we use a simple 
finite difference discretization of Poisson’s equation on the L-shaped domain (0,1) X 
(0, l)\[l/2,1] x [1/2,1], The inverse is computed by standard hierarchical matrix methods 
m and then converted into an Pf 2 -matrix [[3 Section 6.5]. The accuracy for inversion 
and conversion is chosen like 0(l/n) in order to compensate for the growing condition 
number. 

We use a variable-order polynomial basis [6] for the representation of the hierarchical 
vectors, where we use bicubic polynomials in the leaf clusters. The order in each coor¬ 
dinate direction is increase if the ratio of the extents of the bounding boxes of father 
and son are less than 3/5. This approach ensures that the order used in the root cluster 
increases by one each time the mesh is refined. 

The standard Lagrange basis is orthogonalized [3l Section 5.4] and the matrices 
(Pt)t£j- X and ( Zt)teTx f° r computing the coarsening and projection error are constructed. 
The implementation currently constructs the entire induced basis and applies the stan¬ 
dard backward transformation given in Figure [5] instead of the optimized version given in 
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Figure 10: Runtime of the inverse iteration using standard and hierarchical vectors for 
different mesh resolutions 


Figure [7J This choice may lead to a loss in performance for the matrix-vector multiplica¬ 
tion algorithm, but it allows us to use standard functions to deal with the induced basis 
instead of implementing specialized versions for all required operations. The theoretical 
estimates, particularly Theorem 23, remain valid. 

Once the setup is complete, we perform 20 steps of the inverse iteration using the 
■H 2 -matrix approximation of the inverse and measure the corresponding runtime. For 
different refinement levels ranging from 64 to 1024 intervals per coordinate direction, 
corresponding to 2977 to 784897 degrees of freedom, the times are represented in Fig¬ 
ure M we can see that hierarchical vectors are faster than standard vectors even for 
relatively small problems, and that they are faster by a factor of more than five for the 
largest problem. 

The error tolerance for the hierarchical vector compression starts at e = 10~ 5 and is 
approximately halved each time the mesh is refined The accuracy of the approximation 
is computed by converting the hierarchical vector to a standard vector and finding the 
Euclidean norm of the difference. Surprisingly, the results represented in Figure [ll] show 
that the error seems to converge at a rate of 0(l/n), while the given error tolerance 
decreases like 0(l/y/n). 

Next we investigate the influence of the chosen error tolerance. We fix the mesh with 
784897 degrees of freedom and consider a scale of error tolerances between 5 x 10 -10 
and 5 x 10 . The leaf partitions (corresponding to C x ) constructed by the conversion 

algorithm (cf. Figure [9]) in combination with the adaptive coarsening (cf. Figure [4]) are 
displayed in Figure [12J We can see that they display the typical behaviour of adaptively 
refined meshes: very small clusters are only used close to the singularity, while most 
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Figure 11: Accuracy of the hierarchical vector for different mesh resolutions 




Figure 12: Leaf partitions for error tolerances 5 X 10 ' and 5 x 10 10 
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Figure 13: Computing time in relation to the number m = 4KTx of clusters 


of the domain is covered by a few large clusters. While the algorithm requires only 
i^Tx = 86 clusters to reach the tolerance 5 x 10~ 7 , it takes #7^ = 480 clusters to reach 
the very high tolerance 5 x 1CF 10 . Since we are using the standard Euclidean norm in a 
space of dimension 784897, the latter is already fairly close to machine precision. 

Figure 13 shows the computing time for 20 steps of the inverse iteration in relation 
to the number of clusters in T x . The results confirm the prediction of Theorem 5.7 the 
runtime is directly proportional to m = #Tx- 

Finally, Figure 14 shows the relation between the error tolerance and the number 
of clusters in the resulting subtree T x ■ The number of clusters seems to grow like 
0(| log(e)| 3 ) depending on the error tolerance e. 


8 Conclusion and extensions 

Hierarchical vectors provide us with a purely algebraic counterpart of adaptively refined 
meshes: clusters of indices correspond to patches of the mesh, a cluster tree corresponds 
to a refinement hierarchy, cluster bases play the role of local trial spaces, and the error 
matrices (-Pi)teTr and (Z t ) te j x are used instead of local error estimators. 

These error matrices provide us with the exact error instead of just an estimate, 
allowing us, e.g., to compute best approximations, either by minimizing the number of 
terms required for a given accuracy or by minimizing the error for a given number of 
terms. 

The method can be applied to any operator that can be represented as an 77 2 -matrix, 
e.g., to integral operators of positive or negative order and to partial differential operators 
or the corresponding solution operators. 
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Figure 14: Number of clusters # T x in relation to the accuracy 


The complexity of the algorithms can be bounded under the standard assumption 
that the block tree Txxj used for the 74 2 -matrix is sparse. We have seen that in this 
case 0(k 2 #Tx) operations are required to perform a matrix-vector multiplication with 
a hierarchical vector corresponding to a cluster tree T x - The numerical experiments (cf. 
Figure [13]) confirm this estimate. 

In order to obtain optimal error control, our technique relies on the projection error 
matrices (Zt)telx used to adaptively convert the intermediate solution given in the in¬ 
duced cluster basis into the cluster basis ( Qt)teT x prescribed by the application. With 
the direct approach presented in this paper, these matrices require 0(kr#Tx) units of 
storage, and constructing these matrices requires 0(k 3 #Tx) operations. Since #Tx is 
typically quite large compared to #7^, the setup phase of the algorithm will take far 
more time than the actual matrix-vector multiplications. 

• A long setup phase is acceptable if a very large number of matrix-vector multipli¬ 
cations are performed, e.g., if we are using a timestepping scheme or if we apply 
an iterative method to compute eigenvectors or solve optimization problems. 

• If we only require an upper bound for the error, we can construct small local error 
matrices for each matrix block instead of the global error matrices (Zt)te 77, or we 
can even just compute a bound for the norm of these matrices by a simple power 
iteration. 

• If we are using a regular mesh, we can construct a cluster tree such that all clusters 
on a given level are identical up to translation. In this case, all transfer matrices 
and coupling matrices will also be identical on a level, so it suffices to carry out 
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each step of the setup only once per level instead of once per cluster. Under typical 
assumptions, this reduces the setup complexity to 0(k 3 log #7x). 

The experiments presented in the current paper use an W 2 -matrix approximation of 
the inverse. In order to improve the performance, it might make sense to replace the 
direct evaluation of the inverse by a iterative scheme with a preconditioner based on an 
■H 2 -LR or % 2 -Cholesky factorization [Jj. This approach would require us to compute 
the residual r = b — Ax of the linear system, and this residual will in general not be 
as smooth as the solution, so the strict accuracy conditions imposed by our algorithm 
would lead to the subtree T r corresponding to r becoming very large. Since the residual 
is only used to improve an approximate solution x, it might be possible to relax the 
accuracy conditions, e.g., by ensuring that T r is constructed from T x by refining each 
leaf cluster at most once. 
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